Learning with Partial Observations in General-sum Stochastic Games
ثبت نشده
چکیده
In many situations, multiagent systems must deal with partial observability that agents have in the environment. In these cases, finding optimal solutions is often intractable for more than two agents and approximated solutions are often the only way to solve these problems. The models known to represent this kind of problem is Partially Observable Stochastic Game (POSG). Such a model is usually solved by approaches where the environment dynamic is known. In this article, we use the observability itself as a way to approximate the multiagent solution in cases where agents learn using interaction an environment of which the dynamic is not known. We restrict the notion of observability to mutual observability where each agent can see a subpart of the environment and all agents together have a complete perception. We present a class of reinforcement learning algorithms based on equilibria solutions and show how those solutions can be approximated on some single step games and on more general stochastic games. We present results using different known equilibria (Nash and Correlated) and show that limited neighbourhood allows obtaining either equilibrium solutions or approximated equilibrium solutions when computing only equilibria with limited number of agents.
منابع مشابه
A Maximum Principle for Stochastic Differential Games with g–expectations and partial information
In this paper, we initiate a study on optimal control problem for stochastic differential games under generalized expectation via backward stochastic differential equations and partial information. We first prove a sufficient maximum principle for zero-sum stochastic differential game problem. And then extend our approach to general stochastic differential games (nonzero–sum games), and obtain ...
متن کاملR-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning
R-max is a very simple model-based reinforcement learning algorithm which can attain near-optimal average reward in polynomial time. In R-max, the agent always maintains a complete, but possibly inaccurate model of its environment and acts based on the optimal policy derived from this model. The model is initialized in an optimistic fashion: all actions in all states return the maximal possible...
متن کاملPolicy Invariance under Reward Transformations for General-Sum Stochastic Games
We extend the potential-based shapingmethod fromMarkov decision processes to multi-player general-sum stochastic games. We prove that the Nash equilibria in a stochastic game remains unchanged after potential-based shaping is applied to the environment. The property of policy invariance provides a possible way of speeding convergence when learning to play a stochastic game.
متن کاملConvergence Problems of General-Sum Multiagent Reinforcement Learning
Stochastic games are a generalization of MDPs to multiple agents, and can be used as a framework for investigating multiagent learning. Hu and Wellman (1998) recently proposed a multiagent Q-learning method for general-sum stochastic games. In addition to describing the algorithm, they provide a proof that the method will converge to a Nash equilibrium for the game under specified conditions. T...
متن کاملA Study of Gradient Descent Schemes for General-Sum Stochastic Games
Zero-sum stochastic games are easy to solve as they can be cast as simple Markov decision processes. This is however not the case with general-sum stochastic games. A fairly general optimization problem formulation is available for general-sum stochastic games by Filar and Vrieze [2004]. However, the optimization problem there has a non-linear objective and non-linear constraints with special s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006